Project 1
Project Description
Sergei Mikhailovich Prokudin-Gorskii (1863-1944) was a pioneer photographer in the 20th century. He envisioned color photography in a monochrome world, and he took actions by recording three exposures of every scene onto a glass plate using a red, a green, and a blue filter. His photos depicted the last years of the Russian Empire. They were purchased in 1948 by the Library of Congress, digitized, colored, and made public online.
The goal in this project is to take the digitized Prokudin-Gorskii glass plate images (three monochrome ones) and, using image processing techniques, automatically produce a single color image with as few visual artifacts as possible.
The challenge I have to solve is image alignment, since it is rarely possible to ask an object to remain stationary as Prokudin-Gorskii took his three monochrome photos, which is also the key flaw in his strategy.
Small Size Image Alignment (exhaustive search)
For small size images (jpg format), in which the total number of pixels of a monochrome photo is usually around 120k, I used exhaustive search in looking for the best alignment. Using the blue filter channel as reference, I aligned red and green filter channels by circular translating a search window of [-15, 15] in both the x and y axes according to either sum of squared distance (SSD) or normalized cross correlation (NCC) similarity score metric. In addition, I implemented the option to crop 2% on all four sides in each of the monochrome photos before alignment, in order to remove the impacts of black borders.
From my experience, the two similarity score metrics do not have a difference in results, but the choice in cropping does. Below are example small size images, which include the best red and green channel displacements as well as whether they are cropped.
cathedral
original
green translation: [1, -1]
red translation: [7, -1]
no crop
green translation: [5, 2]
red translation: [11, 3]
crop
monastery
original
green translation: [-6, 0]
red translation: [9, 1]
no crop
green translation: [-3, 0]
red translation: [3, 2]
crop
tobolsk
original
green translation: [3, 2]
red translation: [6, 3]
no crop
green translation: [3, 2]
red translation: [6, 3]
crop
As shown above, cropping performs better than no cropping in cathedral and monastery. The two methods perform equally well in tobolsk.
Large Size Image Alignment (pyramid search)
For large size images (tif format), in which the total number of pixels of a monochrome photo is usually around 10 million, exhaustive search becomes terribly expensive computationally. To solve this challenge, I implemented an image pyramid algorithm.
After dividing the original photo Prokudin-Gorskii took into three monochrome ones, my image pyramid algorithm first downscales each image to the coarsest level, where the total number of pixels is less than 90k. At the coarsest level, I used exhaustive search, the same method used in small size image alignment, in a window of [-15, 15] for alignment. Then I upscales the image by a factor of 2 until it reaches the original image size. At each step in upscaling, I first double the best displacement found in the previous coarser level, then I refine this displacement in a search window of [-5, 5].
Similar to small size image alignment, I incorporated SSD and NCC as two similarity score metrics, as well as the option to crop 2% on all four sides in each of the monochrome photos before alignment. According to results, cropping always produces at least as well as the no cropping ones. In addition, the two similarity score metrics do not have significant performance differences. The computation cost for each color image generation is roughly 20 seconds.